Developing a test collection for biomedical word sense disambiguation
نویسندگان
چکیده
Ambiguity, the phenomenon that a word has more than one sense, poses difficulties for many current Natural Language Processing (NLP) systems. Algorithms that assist in the resolution of these ambiguities, i.e. which make unambiguous a word, or more generally, a text string, will boost performance of these systems. To test such techniques in the biomedical language domain, we have developed a Word Sense Disambiguation (WSD) test collection that comprises 5,000 unambiguous instances for 50 ambiguous UMLS Metathesaurus strings.
منابع مشابه
Generating quality word sense disambiguation test sets based on MeSH indexing
Word sense disambiguation (WSD) determines the correct meaning of a word that has more than one meaning, and is a critical step in biomedical natural language processing, as interpretation of information in text can be correct only if the meanings of their component terms are correctly identified first. Quality evaluation sets are important to WSD because they can be used as representative samp...
متن کاملSense-Based Biomedical Indexing and Retrieval
This paper tackles the problem of term ambiguity, especially for biomedical literature. We propose and evaluate two methods of Word Sense Disambiguation (WSD) for biomedical terms and integrate them to a sense-based document indexing and retrieval framework. Ambiguous biomedical terms in documents and queries are disambiguated using the Medical Subject Headings (MeSH) thesaurus and semantically...
متن کاملSelf-training and co-training in biomedical word sense disambiguation
Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. Due to the scarcity of training data, semi-supervised learning, which profits from seed annotated examples and a large set of unlabeled data, are worth researching. We present preliminary results of two semi-supervised learnin...
متن کاملResolving ambiguity in biomedical text to improve summarization
Access to the vast body of research literature that is now available on biomedicine and related fields can be improved with automatic summarization. This paper describes a summarization system for the biomedical domain that represents documents as graphs formed from concepts and relations in the UMLS Metathesaurus. This system has to deal with the ambiguities that occur in biomedical documents....
متن کاملSemantic Relatedness for Biomedical Word Sense Disambiguation
This paper presents a graph-based method for all-word word sense disambiguation of biomedical texts using semantic relatedness as edge weight. Semantic relatedness is derived from a term-topic co-occurrence matrix. The sense inventory is generated by the MetaMap program. Word sense disambiguation is performed on a disambiguation graph via a vertex centrality measure. The proposed method achieve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings. AMIA Symposium
دوره شماره
صفحات -
تاریخ انتشار 2001